图形预训练策略一直在图形挖掘社区吸引人们的注意力,因为它们在没有任何标签信息的情况下在参数化图形神经网络(GNN)方面的灵活性。关键思想在于通过预测从输入图中提取的掩蔽图信号来编码有价值的信息。为了平衡各种图形信号的重要性(例如节点,边缘,子图),现有方法主要是通过引入超参数来重新进行图形信号的重要性来进行手工设计的。然而,人类对亚最佳高参数的干预通常会注入额外的偏见,并在下游应用中降低了概括性能。本文从新的角度解决了这些局限性,即为预培训GNN提供课程。我们提出了一个名为Mentorgnn的端到端模型,该模型旨在监督具有不同结构和不同特征空间的图表的GNN的预训练过程。为了理解不同粒度的异质图信号,我们提出了一种课程学习范式,该课程自动重新贴出图形信号,以确保对目标域进行良好的概括。此外,我们通过在预先训练的GNN的概括误差上得出自然且可解释的上限,从而对关系数据(即图形)的域自适应问题(即图形)发出了新的启示。有关大量真实图的广泛实验验证并验证了Mentorgnn的性能。
translated by 谷歌翻译
由国土安全企业与安全相关的应用程序直接激励,我们着重于对图形数据的隐私保护分析,该分析提供了代表丰富属性和关系的关键能力。特别是,我们讨论了两个方向,即保护隐私图和联合图形学习,这可以共同使每个拥有私人图形数据的多个政党之间的协作。对于每个方向,我们都确定“快速获胜”和“硬问题”。最后,我们演示了一个可以促进模型解释,解释和可视化的用户界面。我们认为,在这些方向上开发的技术将大大提高国土安全企业的能力,以应对和减轻各种安全风险。
translated by 谷歌翻译
虚假信息是指故意传播的虚假信息以影响公众,而虚假信息对社会的负面影响可以在许多问题(例如政治议程和操纵金融市场)中观察到。在本文中,我们确定了从多个方面的自动虚假信息检测相关的普遍挑战和进步,并提出了一个称为迪斯科的全面和可解释的虚假发现检测框架。它利用了虚假信息的异质性,并解决了预测的不透明性。然后,我们以令人满意的检测准确性和解释为现实世界中的假新闻检测任务提供了迪斯科舞厅的演示。迪斯科的演示视频和源代码现已公开可用。我们希望我们的演示可以为解决整体的识别,理解和解释性的局限性铺平道路。
translated by 谷歌翻译
浅GNN倾向于与具有缺失功能的大型图形或图形相关性能。因此,有必要增加GNN的深度(即,层数),以捕获对输入数据的更多潜在知识。另一方面,包括GNN中的更多层通常会降低其性能,例如消失的梯度和过度平滑。现有的方法(例如,配对和DropEdge)主要集中于解决过度厚度,但它们遭受了一些缺点,例如需要难以提高知识或进行大型培训随机性。此外,这些方法只是将重新连接到解决消失的梯度。他们忽略了一个重要的事实:与从遥远的邻居中收集的信息相比,与从1跳和2跳的邻居收集的信息相比,从遥远的邻居收集的信息变得占主导地位,从而导致严重的性能退化,从而使其占主导地位。在本文中,我们首先深入研究了Resnet的架构,并分析了为什么Resnet最不适合更深的GNN。然后,我们提出了一种新的残留体系结构,以减轻重新系统造成的负面影响。为了解决这些现有方法的缺点,我们介绍了名为TGCL的拓扑引导的图形对比损失。它利用节点拓扑信息,并通过对比度学习正则化将连接的节点对靠近,以获得歧视性节点表示。将新的残留体系结构与TGCL相结合,提出了一个名为更深的GNNS的端到端框架。对现实世界数据集的广泛实验证明了与最先进的基线相比,更深型GXX的有效性和效率。
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
A noisy training set usually leads to the degradation of the generalization and robustness of neural networks. In this paper, we propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels. Specifically, we first present a Scalable Penalized Regression (SPR) method, to model the linear relation between network features and one-hot labels. In SPR, the clean data are identified by the zero mean-shift parameters solved in the regression model. We theoretically show that SPR can recover clean data under some conditions. Under general scenarios, the conditions may be no longer satisfied; and some noisy data are falsely selected as clean data. To solve this problem, we propose a data-adaptive method for Scalable Penalized Regression with Knockoff filters (Knockoffs-SPR), which is provable to control the False-Selection-Rate (FSR) in the selected clean data. To improve the efficiency, we further present a split algorithm that divides the whole training set into small pieces that can be solved in parallel to make the framework scalable to large datasets. While Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline, we further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data. Experimental results on several benchmark datasets and real-world noisy datasets show the effectiveness of our framework and validate the theoretical results of Knockoffs-SPR. Our code and pre-trained models will be released.
translated by 谷歌翻译
As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.
translated by 谷歌翻译
Medical image segmentation (MIS) is essential for supporting disease diagnosis and treatment effect assessment. Despite considerable advances in artificial intelligence (AI) for MIS, clinicians remain skeptical of its utility, maintaining low confidence in such black box systems, with this problem being exacerbated by low generalization for out-of-distribution (OOD) data. To move towards effective clinical utilization, we propose a foundation model named EvidenceCap, which makes the box transparent in a quantifiable way by uncertainty estimation. EvidenceCap not only makes AI visible in regions of uncertainty and OOD data, but also enhances the reliability, robustness, and computational efficiency of MIS. Uncertainty is modeled explicitly through subjective logic theory to gather strong evidence from features. We show the effectiveness of EvidenceCap in three segmentation datasets and apply it to the clinic. Our work sheds light on clinical safe applications and explainable AI, and can contribute towards trustworthiness in the medical domain.
translated by 谷歌翻译
Most Graph Neural Networks follow the message-passing paradigm, assuming the observed structure depicts the ground-truth node relationships. However, this fundamental assumption cannot always be satisfied, as real-world graphs are always incomplete, noisy, or redundant. How to reveal the inherent graph structure in a unified way remains under-explored. We proposed PRI-GSL, a Graph Structure Learning framework guided by the Principle of Relevant Information, providing a simple and unified framework for identifying the self-organization and revealing the hidden structure. PRI-GSL learns a structure that contains the most relevant yet least redundant information quantified by von Neumann entropy and Quantum Jensen-Shannon divergence. PRI-GSL incorporates the evolution of quantum continuous walk with graph wavelets to encode node structural roles, showing in which way the nodes interplay and self-organize with the graph structure. Extensive experiments demonstrate the superior effectiveness and robustness of PRI-GSL.
translated by 谷歌翻译
State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly linearly in sequence length instead of quadratically, SSMs are still slower than Transformers due to poor hardware utilization. In this paper, we make progress on understanding the expressivity gap between SSMs and attention in language modeling, and on reducing the hardware barrier between SSMs and attention. First, we use synthetic language modeling tasks to understand the gap between SSMs and attention. We find that existing SSMs struggle with two capabilities: recalling earlier tokens in the sequence and comparing tokens across the sequence. To understand the impact on language modeling, we propose a new SSM layer, H3, that is explicitly designed for these abilities. H3 matches attention on the synthetic languages and comes within 0.4 PPL of Transformers on OpenWebText. Furthermore, a hybrid 125M-parameter H3-attention model that retains two attention layers surprisingly outperforms Transformers on OpenWebText by 1.0 PPL. Next, to improve the efficiency of training SSMs on modern hardware, we propose FlashConv. FlashConv uses a fused block FFT algorithm to improve efficiency on sequences up to 8K, and introduces a novel state passing algorithm that exploits the recurrent properties of SSMs to scale to longer sequences. FlashConv yields 2$\times$ speedup on the long-range arena benchmark and allows hybrid language models to generate text 1.6$\times$ faster than Transformers. Using FlashConv, we scale hybrid H3-attention language models up to 1.3B parameters on the Pile and find promising initial results, achieving lower perplexity than Transformers and outperforming Transformers in zero- and few-shot learning on a majority of tasks in the SuperGLUE benchmark.
translated by 谷歌翻译